NCHLT: isiZulu POS tag set

Tag set

 

For purposes of annotators, this tag set is by and large taken over from Taljard et al. (2008) and various documents compiled by G. Faasz and U. Heid from the IMS, Stuttgart and D.J. Prinsloo and E. Taljard, University of Pretoria. The information below refers to the current state of the tagset, but further development will probably necessitate any number of changes.

The tagset is mainly based on the lexical and morphological criteria defined by Lombard (1985) and Louwrens (1991). The logical structure of the tagset is divided into two layers of linguistic description (annotation levels):

The first annotation level (level 1) includes all mandatory, or, according to EAGLES, obligatory information, namely up to three elements: an element hinting at the word class, a second one specifying functional or syntactic properties, and a third one giving morphological specifics, cf. e.g. PRO(noun)EMP(hatic)PERS(on).

The second level of annotation (level 2) includes recommended and optional information. This level is in most cases used for a detailed description of closed class items described in the tagger lexicon. Compare the following excerpt:

 

Figure 1: Annotation levels

Description

Tag 1st level (mandatory information)

Tag 2nd level (optional/ recommended information)

Pronouns:

 

 

emphatic personal

PROEMPPERS

1sg,2sg,1pl,2pl

Verbals:

V

tr

Morphemes:

 

 

deficient

MORPH

def

 

For disjunctive languages, next to all orthographic words, all linguistic words will also be tagged, resulting in two layers of POS annotation: one for all orthographic words and one for all linguistic words. For conjunctive languages, this extra layer of POS annotation is not needed.

The tagset currently distinguishes 20 categories applicable to isiZulu and two different levels of annotation. However, only level 1 has been annotated. The first part of the tag gives a general indication of the nature of the unit in question. These are as follows:

 

 

Tag

Explanation

PUNC

Punctuation

ABBR

Abbreviation (incl. acronyms)

ADJ

Adjective (incl. enumerative)

ADV

Adverb

CDEM

Class-indicating demonstrative

CONJ

Conjunction

COP

Copulative (copulative subject concord, demonstrative copulative, copulative verb)

FOR

Foreign

IDEO

Ideophone

INT

Interjection

INTER

Question word

N

Noun

NPP

Place and brand name

NUM

Numerative

POSS

Possessive (possessive concord, possessive pronoun)

PROEMP

Emphatic pronoun

PROQUANT

Quantitative pronoun

REL

Relative

V

Verbal

VAUX

Auxiliary verb

 

 

 

 

Tags not applicable to isiZulu:

ASP

Aspectual marker

AUX

Auxiliary stem

CN

Class-indicating nominal prefix

CO

Class-indicating object concord

CS

Class-indicating subject concord

MNEG

Negative morpheme

PART

Particle

TENS

Tense marker

 


PUNCTUATION

Level 1: PUNC

Notes:

Examples:

;

PUNC

(

PUNC

!

PUNC

PUNC

 

ABBREVIATION

Level 1: ABBR

Notes:

Examples:

NGO

ABBR

isib.

ABBR

 

ADJECTIVE

Level 1: ADJ01-11, ADJ14-15, ADJ01a, ADJ02a, ADJLOC

Notes:

Examples:

omunye

ADJ01

eside

ADJ07

kokubili

ADJLOC

 

ADVERB

Level 1: ADV, ADVLOC

Notes:

Examples:

cishe

ADV

ngaphandle

ADV

emva

ADVLOC

 

 [CLASS-INDICATING] DEMONSTRATIVE

Level 1: CDEM01-11, CDEM14-15, CDEMLOC

Notes:

 

Examples:

laba

CDEM02

lelo

CDEM05

kulokho

CDEMLOC

 

CONJUNCTION

Level 1:           CONJ

Notes:

Examples:

futhi

CONJ

ngabe

CONJ

 

COPULATIVE

Level 1: COP

Level 2: COP_neg, COP_nil

Notes:

(-be, - and –bilê). For the copulative verb stem –se  the tag COP_neg on level 2 is used, as is the case for the verb stem –be (<-ba) when it is used in the negative form.

Examples:

ngeke

COP

ngumphakathi

COP

 

FOREIGN

Level 1: FOR

Notes:

Examples:

development

FOR

projector

FOR

 

IDEOPHONE

Level 1: IDEO

Examples:

ngqo

IDEO

xaxa

IDEO

 

INTERJECTION

Level 1: INT

Level 2: INT_neg, INT_nil

Notes:

Examples:

yini

INT

hhayi

INT

 

INTERROGATIVES

Level 1: INTER

Level 2: _man, _time, _loc, _N01a, _N02a

Notes:

Examples:

kuphi

INTER

muni

INTER

yini

INTER

 

NOUN

Level 1: N01-11, N14-N15, N01a, N02a, NLOC, N00

Level 2: _aug, _dim, _loc, _name, _nil

Notes:

Examples:

umuntu

N01

uhulumeni

N01a

umsebenzi

N03

amaphuzu

N06

izindlela

N10

ebantwini

NLOC

 

PLACE AND BRAND NAME

Level 1: NPP

Level 2: NPP_place, NPP_brand

Notes:

Examples:

KwaZulu-Natal

NPP

Mars

NPP

 

NUMERATIVE

Level 1: NUM

Notes:

Examples:

2005

NUM

54

NUM

74(a)

NUM

 

POSSESSIVE

Level 1: POSS01-11, POSS14-15, POSSLOC, POSSPERS, POSSKA

Level 2: POSSPERS_1pl, POSSPERS_2pl

Notes:

Examples:

wakhe

POSS01

yamaqembu

POSS04

kamasipala

POSSKA

 

EMPHATIC PRONOUN

Level 1: PROEMP01-11, PROEMP14-15, PROEMPLOC, PROEMPPERS

Level 2: PROEMPPERS_1sg, PROEMPPERS_1pl, PROEMPPERS_2sg, PROEMPPERS_2pl

Notes:

Examples:

yona

PROEMP04

khona

PROEMP15

kukho

PROEMPLOC

 

QUANTITATIVE PRONOUN

Level 1: PROQUANT01-11, PROQUANT14-15, PROQUANTLOC

Notes:

Examples:

wonke

PROQUANT03

sonke

PROQUANT07

konke

PROQUANT15

 

RELATIVE

Level 1: REL

Notes:

Examples:

esithile

REL

okumele

REL

 

VERBAL

Level 1: V

Level 2: V_tr, V_itr, V_dtr

Notes:

Examples:

abamba

V

ukubhekana

V

kumele

V

 

AUXILIARY VERB

Level 1: VAUX

Level 2: VAUX_tr, VAUX-itr, VAUX_dtr

Notes:

Examples:

abe

VAUX

bese

VAUX